Incremental Shared Nearest Neighbor Density-Based Clustering Algorithms for Dynamic Datasets

نویسنده

Panthadeep Bhattacharjee

چکیده

Dynamic datasets undergo frequent changes where small number of data points are added and deleted. Such dynamic datasets are frequently encountered in many real world applications such as search engines and recommender systems. Incremental data mining algorithms process these updates to datasets efficiently to avoid redundant computation. Shared nearest neighbor density based clustering (SNN-DBSCAN) is a widely used clustering algorithm, mainly for its robustness. Existing incremental extension to SNNDBSCAN cannot handle deletions to dataset and handles insertions only point by point. We overcome both these bottlenecks by efficiently identifying affected parts of clusters while processing updates to dataset in batch mode. We present three different incremental algorithms with varying efficiency at elimination of redundant computation. We show effectiveness of our algorithms by performing experiments on large synthetic as well as real world datasets. Our algorithms are up to 2 Orders of Magnitude faster than nonincremental algorithm and up-to 50 times faster than existing incremental algorithm while guaranteeing exact same output.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Batch Incremental Shared Nearest Neighbor Density Based Clustering Algorithm for Dynamic Datasets

Incremental data mining algorithms process frequent updates to dynamic datasets efficiently by avoiding redundant computation. Existing incremental extension to shared nearest neighbor density based clustering (SNND) algorithm cannot handle deletions to dataset and handles insertions only one point at a time. We present an incremental algorithm to overcome both these bottlenecks by efficiently ...

متن کامل

Coherent Gene Expression Pattern Finding Using Clustering Approaches

Analysis of gene expression data is an important research field in DNA microarray research. Data mining techniques have proven to be useful in understanding gene function, gene regulation, cellular processes and subtypes of cells. Most data mining algorithms developed for gene expression data deal with the problem of clustering. The purpose of this thesis is to study different clustering approa...

متن کامل

A Survey Paper on Data Clustering using Incremental Affine Propagation

Clustering domain is vital part of data mining domain and widely used in different applications. In this project we are focusing on affinity propagation (AP) clustering which is presented recently to overcome many clustering problems in different clustering applications. Many clustering applications are based on static data. AP clustering approach is supporting only static data applications, he...

متن کامل

Software Cost Estimation by a New Hybrid Model of Particle Swarm Optimization and K-Nearest Neighbor Algorithms

A successful software should be finalized with determined and predetermined cost and time. Software is a production which its approximate cost is expert workforce and professionals. The most important and approximate software cost estimation (SCE) is related to the trained workforce. Creative nature of software projects and its abstract nature make extremely cost and time of projects difficult ...

متن کامل

Assessment of the Performance of Clustering Algorithms in the Extraction of Similar Trajectories

In recent years, the tremendous and increasing growth of spatial trajectory data and the necessity of processing and extraction of useful information and meaningful patterns have led to the fact that many researchers have been attracted to the field of spatio-temporal trajectory clustering. The process and analysis of these trajectories have resulted in the extraction of useful information whic...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2016

Incremental Shared Nearest Neighbor Density-Based Clustering Algorithms for Dynamic Datasets

نویسنده

چکیده

منابع مشابه

Batch Incremental Shared Nearest Neighbor Density Based Clustering Algorithm for Dynamic Datasets

Coherent Gene Expression Pattern Finding Using Clustering Approaches

A Survey Paper on Data Clustering using Incremental Affine Propagation

Software Cost Estimation by a New Hybrid Model of Particle Swarm Optimization and K-Nearest Neighbor Algorithms

Assessment of the Performance of Clustering Algorithms in the Extraction of Similar Trajectories

عنوان ژورنال:

اشتراک گذاری